Skip to content

test(envelope-contract): pin producer→consumer envelope contracts (closes #114)#178

Merged
Lykhoyda merged 1 commit into
mainfrom
test/gh-114-envelope-contract
May 20, 2026
Merged

test(envelope-contract): pin producer→consumer envelope contracts (closes #114)#178
Lykhoyda merged 1 commit into
mainfrom
test/gh-114-envelope-contract

Conversation

@Lykhoyda
Copy link
Copy Markdown
Owner

Summary

Closes #114 — adds 25 contract tests that pin the envelope shapes each dispatch tier emits against the handler-side parsers that consume them. A future divergence on either side fails fast in CI rather than slipping through to prod.

Background

Handler integration tests stub runAgentDevice via _setRunAgentDeviceForTest with synthetic envelopes. Codex flagged on PR #109 (conf 80) that this short-circuits the real wrapper's dispatch tiers, each of which produces subtly different shapes. The original issue listed agent-device's internal tiers (fast-runner HTTP / daemon socket / CLI subprocess); post-PR #164 (iOS-MVP) and PR #165 (Android-MVP) the surface widened to include the in-tree iOS and Android runner clients too. This PR pins the current surface.

Producer fixtures pinned

Producer Shape Notes
In-tree iOS runner {ref: '@e<n>', type, rect, label?, identifier?, enabled?, hittable?} Flat-nodes; post-mapRunnerNodesToFlat normalization
In-tree Android runner Identical to iOS flat-nodes shape Parity test catches divergence
Legacy daemon (socket) Flat-nodes, less metadata
Legacy CLI (subprocess) Flat-nodes (currently same as daemon) Pinned separately so a future split would surface here
Legacy agent-device fast-runner Nested tree (not flat) findRefByTestID's second branch — removing it would fail this test
iOS typeText runner-timeout shim {ok:true, data: {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}} Must NOT classify as snapshot-failed

Consumers exercised

What codex-pair caught during review

Three MED fidelity issues — a contract test with wrong fixtures is worse than no contract test because it gives false confidence. All fixed pre-commit:

  1. My initial in-tree fixtures used ref: 'app-0' style with parentIndex/depth, but mapRunnerNodesToFlat emits @e<n> refs with type/rect/enabled/hittable — verified by reading both rn-fast-runner-client.ts:488 and rn-android-runner-client.ts:230.
  2. The in-tree failure fixture used the raw HTTP error shape {error: {message, code}} — but MCP consumers see the post-failResult shape {ok:false, error: string, code: string} (per failResult(message, code) at rn-fast-runner-client.ts:564).
  3. Comment claimed daemon + CLI were pinned separately but only daemon was. Added a separate CLI fixture so the claim is honest.

Test plan

  • 1506/1506 cdp-bridge unit tests passing (+25 net new)
  • All 5 producer fixtures × findRefByTestID consumer → resolves expected ref by identifier
  • All 3 failure-envelope fixtures × findRefByTestID → returns null (refuses to scan failed snapshot)
  • All 5 producer fixtures × snapshotEnvelopeFailed → returns false (success)
  • All 3 failure fixtures × snapshotEnvelopeFailed → returns true
  • Empty-nodes success and runner-timeout shim both correctly classified as NOT-failed
  • codex-pair clean on final commit
  • CI green

Refs

🤖 Generated with Claude Code

…oses #114)

Adds 25 contract tests that pin the envelope shapes each dispatch tier
emits against the handler-side parsers that consume them, so a future
divergence on either side fails fast in CI rather than slipping through
to production.

Background: handler integration tests stub `runAgentDevice` via
`_setRunAgentDeviceForTest` with synthetic envelopes. Codex flagged on
PR #109 (conf 80) that this short-circuits the real wrapper's three
dispatch tiers, each of which produces subtly different shapes. The
original issue listed agent-device's internal tiers (fast-runner HTTP /
daemon socket / CLI), but post-PR #164 (iOS-MVP) and PR #165 (Android-
MVP) the surface widened to include the in-tree iOS and Android runner
clients too. This PR covers the current surface.

Producers pinned:

  1. In-tree iOS runner (rn-fast-runner-client.runIOS) — flat nodes
     with `{ref: '@e<n>', type, rect, label?, identifier?, enabled?,
     hittable?}` shape after mapRunnerNodesToFlat normalization
  2. In-tree Android runner (rn-android-runner-client.runAndroid) —
     identical flat-node shape (the parity test pins this — a divergence
     here would silently break platform-agnostic handlers)
  3. Legacy upstream agent-device daemon socket — flat-nodes with less
     metadata
  4. Legacy upstream agent-device CLI subprocess — separate fixture
     even though current shape equals daemon, so a future divergence
     would surface here
  5. Legacy upstream agent-device internal fast-runner sub-tier —
     nested-tree shape, NOT flat. findRefByTestID's `env.data.tree`
     branch handles this; removing the branch without warning would
     fail this test
  6. iOS XCUIElement.typeText runner-timeout shim — `{ok:true, data:
     {typed, text}, meta: {sideEffectSucceeded, runnerTimeoutShim}}`.
     snapshotEnvelopeFailed must NOT report this as a failure (it would
     route every successful iOS fill to SNAPSHOT_FAILED otherwise)

Consumers exercised:

  - findRefByTestID (device-batch.ts) — both flat-nodes and nested-tree
    branches
  - snapshotEnvelopeFailed (device-batch.ts) — including the critical
    distinction between empty-nodes success (TESTID_NOT_FOUND) and
    snapshot-infrastructure failure (SNAPSHOT_FAILED), per Phase 128 #5/#6
  - Edge cases: null/undefined/empty/malformed JSON all classified as failed

codex-pair caught three fidelity issues during review (MED): my initial
fixtures used `ref: 'app-0'` style refs with `parentIndex`/`depth`
fields, but the actual mapRunnerNodesToFlat output emits `@e<n>` refs
with `type`/`rect`/`enabled`/`hittable`. The failure fixture used the
raw HTTP error shape `{error: {message, code}}` instead of the post-
failResult `{ok:false, error: string, code: string}` shape MCP consumers
actually see. And the comment claimed daemon + CLI were pinned
separately but only daemon was. All three fixed before commit — a
contract test with the wrong fixtures is worse than no contract test
because it gives false confidence.

Verified: 1506/1506 cdp-bridge unit tests passing (+25 net new).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Lykhoyda Lykhoyda merged commit 9e0a586 into main May 20, 2026
7 checks passed
@Lykhoyda Lykhoyda deleted the test/gh-114-envelope-contract branch May 20, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test coverage gap: agent-device dispatch tiers (fast-runner / daemon / CLI) under realistic envelope shapes

1 participant